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Abstract. Sudakov-type distributions are at the heart of generating radiation in parton showers as well as 
contemporary NLO matching algorithms along the lines of the POWHEG algorithm. In this paper, the 
C++ library ExSample is introduced, which implements adaptive sampling of Sudakov-type distributions 
for splitting kernels which are in general only known numerically. Besides the evolution variable, the 
splitting kernels can depend on an arbitrary number of other degrees of freedom to be sampled, and any 
number of further parameters which are fixed on an event-by-event basis. 
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1 Introduction 

Parton shower Monte Carlo simulations as implemented 
in [TH3, just to name few of the recently developed codes, 
require a way to draw random variates from a probability 
density 



dSp(n,q\Q;z;^) 



dq d n z 



= A P (ji\Q;t)6{q-n)- 



0(Q - q)0(q - n)P(q; z; 0A P (q\Q; £) (1) 



when evolving from a hard scale Q to a soft scale q in the 
presence of an infrared cutoff fj, 7 below which no radiation 
occurs. Here, A P (q\Q;£) is the Sudakov form factor, 



Ap(q\Q;0 = exp 



P(k;z;£)d n z dk 



(2) 



and P(q;z;£) > is the splitting kernel describing the 
dynamics of radiation at a scale q, along with n other 
kinematic parameters z — (zi,...,z n ) and in dependence 
on any further parameters £ = (£i, £ m ). Examples of 
these parameters are momentum fractions of incoming 
partons or invariant masses of the partonic configuration 
from which the next emission is to be generated. The most 
complicated information in terms of additional parame- 
ters is certainly given by the full information on a phase 
space point of a Born-type event from which real emis- 
sion is to be generated in the context of matrix element 
corrections [1HH] or NLO matching using the POWHEG 
method which is originally described in [S] . We refer to the 
probability density defined in eq. [T] as the Sudakov-type 
distribution associated to P. 

Drawing random variates from dSp by standard meth- 
ods is in general not feasible, as the integral entering the 



Sudakov form factor would have to be evaluated numeri- 
cally, and interpolated. Though this is indeed being done 
for example in the FORTRAN version of HERWIG [TU], 
this method ceases to be applicable if the number of ad- 
ditional degrees of freedom or in particular the number of 
additional parameters become large. 

To this extent, current parton shower implementations 
reside on the Sudakov veto algorithm which, e.g. has been 
discussed in [4l lTTlfl"3] . The Sudakov veto algorithm re- 
quires an overestimate R to the splitting kernel of interest 
P, R(q; z; £) > P(q; z; £), and is defined by 

Qstart Q 

loop 

solve rnd= A R (q\Q stalt ;£)6(q - fx) for q 
draw z from R(q; z\ £) 
if q = fx then 

return (p, z) 
else 

return (q,z) with probability P(q; z; £)/R(q; z; £) 
end if 

Qstart q 

end loop 

where rnd denotes a source of random numbers uniformly 
distributed on [0, 1]. Obviously, R needs to be of a simple 
form in such a way that the first step in the loop can easily 
be implemented. 

Finding such an R has up to now always required 
knowledge of properties of the target kernel P, making a 
general-purpose implementation of the algorithm impos- 
sible. Especially towards more complicated splitting ker- 
nels, this manual procedure of determining R from the 
properties of P may not be possible at all: even analytic 
expressions may not be known, P being available only nu- 
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merically. A general implementation may also further en- 
hance flexibility when changing parton distribution func- 
tions in the parton shower backward evolution and thus 
the respective splitting kernels. 

The purpose of ExSample (a shorthand for Exponential 
Sampler) is to provide such a general purpose implementa- 
tion, by adaptively obtaining an overestimate to the target 
splitting kernel in such a way as to optimize the algo- 
rithm's overall performance. 



2 Generation of Adapting Overestimates 

ExSample is very much inspired by the ACDC and FOAM 
algorithms implemented in |141I15] . By the same reason- 
ing, ExSample makes use of 'cells', which represent a sub- 
hypercube of the volume spanned by the evolution vari- 
able q, the additional degrees of freedom z and external 
parameters £. Cells are organized in a binary tree, each 
cell having either two or no children, in the latter case 
terminating the tree at this branch. The union of the two 
hypercubes Ub and U c represented by the two children cells 
Cfc iC always equals the hypercube Ua, c ) represented by the 
parent cell cym ■ Each cell c contains the maximum of the 
target splitting kernel P encountered by a presampling 
as its value w c . The leaf cells of the tree, constituting a 
certain fractal-type partition of the sampling volume into 
hypercubes, define the overestimate function, 



R(q;z;£) 



E 



w c 0((q;z;C)^U c ) 



(3) 



leaf cells c 



Each parent cell keeps track of the integrals of its children 
cells, I c _b — u> c ,bVolume([/b iC ). This allows for an efficient 
sampling of the overestimate function, by selecting either 
of the children cells according to their integral, biased by 
constraints imposed due to the selected evolution variable, 
the externally fixed parameter point and the need to com- 
pensate for newly encountered maxima. 

The next value of the evolution variable is easily gen- 
erated by keeping track of projections of the overestimate 
kernel onto the evolution variable dimension in depen- 
dence on the externally fixed parameter point. In order to 
keep track of the dependence on the additional parameters 
£ as well as the starting value of the evolution variable Q, 
ExSample provides a mechanism to calculate unique hash 
values identifying the sub tree of the cell structure which 
should be considered for a given parameter point. All in- 
formation needed to sample events, i.e. in particular pro- 
jections of the overestimate kernel R and the number of 
'missing' events per cell, to be discussed in section [3l can 
be accessed in dependence on these hash values. The basic 
structure of the sampling is sketched in figure [1] 

The root cell of the tree spans the whole sampling vol- 
ume and is the only cell present at the initial stage of 
the algorithm. Children cells are produced in an adaption 
step, iteratively building up the cell tree through splitting 
a cell into two children cells. This procedure aims at im- 
proving the algorithm's efficiency along with gaining more 
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Fig. 1. A sketch of the algorithm for an evolution variable q, 
one additional variable z, and no further parameters £. The 
top of the figure shows how the leaf cells (in the third plane 
from the top, shown here after two cell splits) are organized 
in a binary tree structure starting from the root cell (7((i2)3). 
The bottom of the figure sketches the overestimate R. To the 
left of the overestimate, the Sudakov exponent corresponding 
to R, F(q) — — lnZi_R(g|l) is shown. Here we assume that the 
absolute upper bound on the evolution variable is q < 1, thus 
the first step to draw an event starting from a scale Q is to solve 
s(Q) = — lnrnd + F(Q) = F (q) for q (indicated by the dashed 
blue line) . A z value is then sampled in the cells containing the 
q value just chosen: The cell integrals over z are computed to 
only reflect the subtree consisting of the black arrows, and the 
tree structure is traversed only along the corresponding paths, 
selecting either of the children cells with weight given by the 
respective integral. Within the boundaries of a leaf cell selected 
by this procedure, a z value is drawn flat. This corresponds to 
drawing a z value from the distribution sketched by the solid 
blue line, the overestimate R at fixed q. 



detailed information on the target splitting kernel, i.e. a 
more fine-grained overestimate closer to it. 

In order to achieve this, each cell always monitors its 
efficiency, which is defined as the ratio of the number of 
accepted points divided by the number of proposed points 
and thus gives a measure of the overall performance of 
the Sudakov veto algorithm. If this efficiency drops be- 
low a user-supplied threshold, the cell is considered 'bad'. 
With a frequency decreasing as the efficiency of the algo- 
rithm increases, and on encounter of a bad cell, a potential 
splitting of the cell is determined to further increase the 
efficiency of the algorithm. 

To obtain an optimal hyper-plane along which the cell 
should be split, each cell histograms projections of the av- 
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erage target kernel value onto each variable dimension k, 
(P)k(xk)- The dimension k (which here may refer to ei- 
ther the evolution variable, one of the additional degrees of 
freedom or any of the external parameters) orthogonal to 
this hyperplane, and the split point x k (out of a number of 
possible split points well inside the cell's boundaries) de- 
fined by the intersection of the hyperplane and this direc- 
tion are determined to maximize a 'gain' measure, defined 
as 



;ain fc (a; fc ) 



Q(P) k (x)dx-Q(P) k (x)dx 



r? (p) k ( X )dx 



(4) 



Here, x k denote the cell's boundaries in the variable x k . 
For reasons of performance and simplicity, the current im- 
plementation uses a two-bin histogram per dimension, and 
Xk — {x^ + x^")/2, leaving only the choice of dimension 
to maximize the gain measure: If the behaviour of P is 
rather flat when projected on one dimension k, this di- 
mension will receive a small gain measure, and projections 
showing more variance in P are more likely to be split 
along. Again, a user-supplied parameter can steer the be- 
haviour of the adaption by considering only those splits 
to be worth performed, if the gain exceeds some value. 

Out of the two children cells the target density is being 
presampled in that cell which did not contain the maxi- 
mum point used before to get a new estimate of the max- 
imum. The number of presampling points per cell is an- 
other user-defined parameter. The choice of this param- 
eter has to be carried out in view of the compensation 
procedure to be defined in the next section with a trade- 
off between the time needed for presampling and the time 
lost by the number of events to be vetoed by the compen- 
sation procedure. There is no general rule on how it is to 
be determined. Experience gained so far shows that few 
thousand presampling points are an acceptable compro- 
mise. 



3 Compensating for New Maxima 

Since the true maximum of the target kernel can never 
be determined with probability one from the presampling 
procedure, care has to be taken on what constraints need 
to be imposed on the sampling procedure once a point has 
been encountered exceeding the currently used maximum. 
For a sufficiently large number of presampling points one 
may reside on the statement that these points are rare 
and generated distributions will not show any effect on 
the erroneous overestimate. Thinking about the overall 
efficiency of the algorithm in performing its function of 
acting as a continuous source of unweighted events with 
the smallest possible overhead, this is certainly not a cri- 
terion to base an implementation on. 

To define the method of compensation, we first intro- 
duce the notion of missing events in a given cell. As for 
the cell's integral, each parent cell carries the sum of the 
missing events of its children cells. The number of missing 
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Fig. 2. A sketch of the algorithm in a setup similar to figure[T] 
now sketching the situation upon encounter of a new overes- 
timate. The new overestimate gave rise to different numbers 
of events expected in each cell (solid rectangles in the lower 
part), as compared to the number of events expected with 
the old overestimate (dotted rectangles). The difference be- 
tween these determines the number of missing events per cell 
(see text for more details). In the sketch given here, the cells 
U2 and U3 would receive a positive number of missing events 
(forcing sampling in these cells as indicated by the black ar- 
rows), whereas cell Ui would contain a negative count of miss- 
ing events, triggering vetoes of attempts to sample points in 
this cell (indicated by the red arrow). 



events is not limited to be positive. In case it is positive, 
the corresponding cell needs to be oversampled, i.e. the 
algorithm is forced to sample events in cells with a pos- 
itive number of missing events, lowering this number in 
the selected cell if it is larger than zero. Oversampling is 
imposed on the algorithm as long as there are cells with a 
positive count of missing events. Conversely, if the missing 
event count is negative, a cell needs to be undersampled. If 
such a cell is selected, its missing event count is increased, 
if it is smaller than zero and the selection is vetoed, trig- 
gering a new cell selection. The behaviour of the algorithm 
in a compensating state is illustrated in figure [2] 

Upon encounter of a new maximum w' c > w c , the num- 
ber of missing events associated to this change is calcu- 
lated for each cell as 



Aw** = Nr [E£ 



1 



(5) 



Here, N c is the number of proposal events already gen- 
erated in the cell, and p c (p' c ) denotes the probability to 
select cell c using the old (new) overestimate value for 
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events above the infrared cutoff. p c is calculated from the 
knowledge of projections of the overestimate kernel in de- 
pendence on the additional parameter point £ and the 
hard scale Q as 



Pc 



J c R(q;z;OA R (q\Q;Qd n zdq 
l-A R (n\Q;£) 



(6) 



^ymiss j g ^hen added to each cell's current missing event 
count. Note that undersampling, jV" llss < appears in 
the cells not containing the newly encountered maximum 
owing to the change in normalization of the overestimate 
density for events above the infrared cutoff. Eq. [5] ensures 
that within the currently accumulated statistics proposal 
events are always distributed according to the last encoun- 
tered maximum, provided the algorithm has been stopped 
in a state where it is not anymore forced to perform over- 
or undersamplings. This is evident by rewriting eq. [5] as 



((N') C -(N) C ) 



(7) 



where {N) c — Np c ((N') c — Np' c ) is the number of ex- 
pected events in cell c for the total number of generated 
events, N. The difference in brackets is the number of 
missing events in the absence of fluctuations due to a fi- 
nite number of generated events, and the factor in front of 
it takes into account the currently accumulated statistics, 
i.e. how much the population of the cell differs from its 
expected population. 



Algorithm 1 The cell selection algorithm. 

calculate sub tree hash h(Q; £) and collect projections 
loop 

solve rnd= An(q\Q; £,)8(q — fi) for q 
if q — fi then 

return event at infrared cutoff 
end if 

collect cell integrals and missing event counters 

cell a— root cell 

while cell is not a leaf do 

if AfflJg S t s Child ( ccll ) > A A r ™ < !.Q S ndChild ( cc]1 ) < then 
cell <- firstChild(cell) 



else if N, 



< A TVf 



firstChild(ccll) 

secondChild (cell) 



secondChild(ccll) 

cell 
else 

if rnd < larstchiid(cell) A;eli then 

cell <- firstChild(cell) 
else 

cell <— secondChild(cell) 
end if 
end if 
end while 
if jV™;r = then 

return cell 
else if JV£jf > then 

Armiss . Armiss -i 
JV ccll iV coll — 1 

return cell 
else if iV^jf < then 
1 



> then 



v cell 

Armiss , Armiss 



end if 
end loop 



4 The Cell Selection Algorithm 

In this section the complete algorithm to generate events 
as implemented in ExSample is defined. We here skip those 
parts connected to monitoring the efficiency of and split- 
ting a cell. Proposal events according to dSn(q\Q; z; £) as 
required by the Sudakov veto algorithm are generated by 
first deciding, if there has been an event at the infrared 
cutoff or otherwise selecting a proposal cell according to 
algorithm [T] 

Once a proposal cell has been selected, a proposal event 
is drawn by sampling the remaining degrees of freedom z 
in the selected cell with uniform distribution. Except for 
the compensating cell selection algorithm outlined above, 
the Sudakov veto algorithm proceeds without modifica- 
tion. 



5 Examples and Validation 

ExSample has been validated for various 'toy' splitting ker- 
nels and within the realistic application of a parton shower 
and POWHEG matching implementation. In this section 
we present simple examples of distributions obtained by 
using ExSample, mainly to illustrate the basic functional- 
ity. 

Figure [3] shows the results obtained by the adaptive Su- 
dakov veto algorithm, using a kernel density showing the 



generic behaviour of a QCD splitting function with run- 
ning a s . Perfect agreement with a numerical integration 
is found. In addition, figure U shows the functionality of 
the compensation procedure by comparing results for the 
same distribution but different numbers of presampling 
points used in the algorithm, which are all consistent with 
each other. 

In figure [5] the results of sampling a Sudakov-type 
distribution in the presence of additional parameters are 
shown. In this example, a quark splitting function mul- 
tiplied by a power law in x/z has been used, where x is 
the additional parameter and z is the momentum frac- 
tion variable to be sampled. The sampled distributions 
in various bins of the additional parameter x have been 
compared to a numerical integration. Full agreement has 
been found here. The presence of adaption splits in the 
parameter dimension has explicitly been checked for this 
example. 

We also use this example, which closely resembles ini- 
tial state backward evolution of a parton shower at small 
values of the momentum fraction x, to asses the improve- 
ments obtained by the adaptive sampling algorithm. Par- 
ticularly, we count the number of vetoed points encoun- 
tered when requiring the same number of events while lim- 
iting the allowed number of cell splits. This way, a direct 
comparison of very coarse to increasingly finer overesti- 
mates is performed. The results are presented in figure [BJ 
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Fig. 3. A Sudakov-type distribution with a QCD splitting 
function type kernel density as sampled by ExSample using 
the adaptive Sudakov veto algorithm. The vertical axis corre- 
sponds to the evolution variable q, the horizontal to a vari- 
able similar to a momentum fraction. Shown are few sampled 
events, projections of the generated distribution versus the re- 
sult from a numerical integration, and the the cell grid pro- 
duced. 
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Fig. 4. The same distribution as shown in the upper left panel 
of figure[3l now sampled with a different number of presampling 
points proving functionality of the compensation procedure. 



showing an exponential improvement with the number of 
splits performed. 
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Fig. 5. Distributions for a Sudakov-type distribution using 
a quark splitting function, multiplied by a power law in x/z. 
Shown are the sampled distribution for the evolution variable 
q and momentum fraction z in various bins of the additional 
parameter x. The distributions are compared to a numerical 
integration, proving full functionality of the sampling in pres- 
ence of additional parameters. 



hard scattering process, can be dealt with in full general- 
ity. ExSample has been validated in 'toy' as well as realistic 
setups, showing full functionality of the implementation. 



The sampling of Sudakov-type distributions is at the heart 
of all current parton shower and POWHEG NLO match- 
ing implementations. In this paper we have introduced the 
C++ library ExSample, which targets at adaptive sam- 
pling of these distributions as defined from a splitting ker- 
nel, which - in general - may only be known through a 
function call. 

Additional parameters, such as typically encountered 
dependencies on incoming parton momentum fractions or 
the full dependence on a phase space point governing a 
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# splits 

Fig. 6. Performance of the algorithm measured as a ratio of the 
number of vetoed points to the number of events requested as 
a function of the number of cell splits allowed. An exponential 
improvement is seen as more and more splits are considered. 
In this example, requesting 500000 events, a maximum of 27 
splits occured. The very efficient region from 18 splits onward 
with below three vetoes per generated event has been reached 
after about 40000 generated events. 
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A Availability and Installation 

ExSample is available from 

http : //www . desy . de/~ platzer/ software/ 
exsample-1 . .tar . gz 

It is a complete header based library, depending ad- 
ditionally only on the presence of boost headers [IB] , An 
installation procedure is thus not required except for mak- 
ing the ExSample headers available to the client code dur- 
ing compilation by including the header file exsample.h. 
ExSample is published under the GNU General Public Li- 
cense version 2 [17] and can thus be freely used and redis- 
tributed. 

The distribution contains extensive documentation, sev- 
eral examples of usage, as well as an implementation for 
standard sampling and adaptive Monte Carlo integration, 
of which ExSample is capable as well. 
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